15 research outputs found

    Explaining Explanation: An Empirical Study on Explanation in Code Reviews

    Full text link
    Code review is an important process for quality assurance in software development. For an effective code review, the reviewers must explain their feedback to enable the authors of the code change to act on them. However, the explanation needs may differ among developers, who may require different types of explanations. It is therefore crucial to understand what kind of explanations reviewers usually use in code reviews. To the best of our knowledge, no study published to date has analyzed the types of explanations used in code review. In this study, we present the first analysis of explanations in useful code reviews. We extracted a set of code reviews based on their usefulness and labeled them based on whether they contained an explanation, a solution, or both a proposed solution and an explanation thereof. Based on our analysis, we found that a significant portion of the code review comments (46%) only include solutions without providing an explanation. We further investigated the remaining 54% of code review comments containing an explanation and conducted an open card sorting to categorize the reviewers' explanations. We distilled seven distinct categories of explanations based on the expression forms developers used. Then, we utilize large language models, specifically ChatGPT, to assist developers in getting a code review explanation that suits their preferences. Specifically, we created prompts to transform a code review explanation into a specific type of explanation. Our evaluation results show that ChatGPT correctly generated the specified type of explanation in 88/90 cases and that 89/90 of the cases have the correct explanation. Overall, our study provides insights into the types of explanations that developers use in code review and showcases how ChatGPT can be leveraged during the code review process to generate a specific type of explanation

    APISENS- Sentiment Scoring Tool for APIs with Crowd-Knowledge

    Full text link
    Utilizing pre-existing software artifacts, such as libraries and Application Programming Interfaces (APIs), is crucial for software development efficiency. However, the abundance of artifacts that provide similar functionality can lead to confusion among developers, resulting in a challenge for proper selection and implementation. Through our preliminary investigation, we found that utilizing the collective knowledge of a crowd can greatly assist developers in acquiring a thorough and complete understanding of the complexities involved in the software development process. Especially as emotions are an inseparable part of human nature, it influences developers' activities. In this regard, we attempt to build a tool that can retrieve sentiment information for software APIs so that developers can determine APIs to utilize for their tasks. We employ the dataset from the most popular platforms (i.e., Twitter and YouTube) to build our research prototype. The source code, tool, and demo video are available on GitHub at \url{https://github.com/FalconLK/APISens}

    APIHarvest: Harvesting API Information from Various Online Sources

    Full text link
    Using APIs to develop software applications is the norm. APIs help developers to build applications faster as they do not need to reinvent the wheel. It is therefore important for developers to understand the APIs that they plan to use. Developers should also make themselves aware of relevant information updates about APIs. In order to do so, developers need to find and keep track of relevant information about the APIs that they are concerned with. Yet, the API information is scattered across various online sources, which makes it difficult to track by hand. Moreover, identifying content that is related to an API is not trivial. Motivated by these challenges, in this work, we introduce a tool named \tool that aims to ease the process of finding API information from various online sources. \tool is built on works that link APIs or libraries to various online sources. It supports finding API information on GitHub repositories, Stack Overflow's posts, tweets, YouTube videos, and common vulnerability and exposure (CVE) entries; and is extensible to support other sources

    Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues

    Full text link
    In this paper, we systematically study the quality of 4,066 ChatGPT-generated code implemented in two popular programming languages, i.e., Java and Python, for 2,033 programming tasks. The goal of this work is three folds. First, we analyze the correctness of ChatGPT on code generation tasks and uncover the factors that influence its effectiveness, including task difficulty, programming language, time that tasks are introduced, and program size. Second, we identify and characterize potential issues with the quality of ChatGPT-generated code. Last, we provide insights into how these issues can be mitigated. Experiments highlight that out of 4,066 programs generated by ChatGPT, 2,757 programs are deemed correct, 1,081 programs provide wrong outputs, and 177 programs contain compilation or runtime errors. Additionally, we further analyze other characteristics of the generated code through static analysis tools, such as code style and maintainability, and find that 1,933 ChatGPT-generated code snippets suffer from maintainability issues. Subsequently, we investigate ChatGPT's self-debugging ability and its interaction with static analysis tools to fix the errors uncovered in the previous step. Experiments suggest that ChatGPT can partially address these challenges, improving code quality by more than 20%, but there are still limitations and opportunities for improvement. Overall, our study provides valuable insights into the current limitations of ChatGPT and offers a roadmap for future research and development efforts to enhance the code generation capabilities of AI models like ChatGPT

    CHRONOS: Time-Aware Zero-Shot Identification of Libraries from Vulnerability Reports

    Full text link
    Tools that alert developers about library vulnerabilities depend on accurate, up-to-date vulnerability databases which are maintained by security researchers. These databases record the libraries related to each vulnerability. However, the vulnerability reports may not explicitly list every library and human analysis is required to determine all the relevant libraries. Human analysis may be slow and expensive, which motivates the need for automated approaches. Researchers and practitioners have proposed to automatically identify libraries from vulnerability reports using extreme multi-label learning (XML). While state-of-the-art XML techniques showed promising performance, their experiment settings do not practically fit what happens in reality. Previous studies randomly split the vulnerability reports data for training and testing their models without considering the chronological order of the reports. This may unduly train the models on chronologically newer reports while testing the models on chronologically older ones. However, in practice, one often receives chronologically new reports, which may be related to previously unseen libraries. Under this practical setting, we observe that the performance of current XML techniques declines substantially, e.g., F1 decreased from 0.7 to 0.24 under experiments without and with consideration of chronological order of vulnerability reports. We propose a practical library identification approach, namely CHRONOS, based on zero-shot learning. The novelty of CHRONOS is three-fold. First, CHRONOS fits into the practical pipeline by considering the chronological order of vulnerability reports. Second, CHRONOS enriches the data of the vulnerability descriptions and labels using a carefully designed data enhancement step. Third, CHRONOS exploits the temporal ordering of the vulnerability reports using a cache to prioritize prediction of...Comment: Accepted to the Technical Track of ICSE 202

    NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python

    Full text link
    Machine learning (ML) has gained much attention and been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such a high-quality dataset poses an obstacle in understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.Comment: Accepted by MSR 202

    Multi-Granularity Detector for Vulnerability Fixes

    Full text link
    With the increasing reliance on Open Source Software, users are exposed to third-party library vulnerabilities. Software Composition Analysis (SCA) tools have been created to alert users of such vulnerabilities. SCA requires the identification of vulnerability-fixing commits. Prior works have proposed methods that can automatically identify such vulnerability-fixing commits. However, identifying such commits is highly challenging, as only a very small minority of commits are vulnerability fixing. Moreover, code changes can be noisy and difficult to analyze. We observe that noise can occur at different levels of detail, making it challenging to detect vulnerability fixes accurately. To address these challenges and boost the effectiveness of prior works, we propose MiDas (Multi-Granularity Detector for Vulnerability Fixes). Unique from prior works, Midas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level, following their natural organization. It then utilizes an ensemble model that combines all base models to generate the final prediction. This design allows MiDas to better handle the noisy and highly imbalanced nature of vulnerability-fixing commit data. Additionally, to reduce the human effort required to inspect code changes, we have designed an effort-aware adjustment for Midas's outputs based on commit length. The evaluation results demonstrate that MiDas outperforms the current state-of-the-art baseline in terms of AUC by 4.9% and 13.7% on Java and Python-based datasets, respectively. Furthermore, in terms of two effort-aware metrics, EffortCost@L and Popt@L, MiDas also outperforms the state-of-the-art baseline, achieving improvements of up to 28.2% and 15.9% on Java, and 60% and 51.4% on Python, respectively
    corecore